The compositonal biplot is based on singular value decomposition of clr-tranformed compositional data and includes both samples and OTUs. It can be seen as a replacement for principle coordinate plots based on Unifrac or Bray-Curtis difference measures (see Gloor, Wu, Pawlowsky-Glahn, & Egozcue (2016) for further explanation). It very well serves as an initial exploratory tool since we can explore for separate groups, identify OTUs that vary highly between groups and identify OTUs that are associated with each other (with some limitations as outlined by Gloor et al. (2016))
The function biplot takes a phyloseq object of clr-transformed data as input and returns a list of plots. By default 2 plots showing the first 4 principal components are returned. There are several options to explore groups and also track individual change in microbiota composition over time. In the following such examples are shown with further explanations.
library(microbiome)
library(tidyverse)
source("https://raw.githubusercontent.com/HenrikEckermann/in_use/master/mb_helper.R")
#import data and apply clr transformation
data(dietswap)
pseq_clr <- transform(dietswap, transform = "clr")
By default the function displays the samples in respect to the first 4 principal components in addition to the factor loadings of the taxa. The top and the right axis of the plots belong to the factor loadings of taxa. We might want to change the default downscaling factor of 10. For example:
scale_10 <- biplot(pseq_clr)
scale_30 <- biplot(pseq_clr, scaling_factor = 30)
scale_10[[1]]
scale_30[[1]]
We have several opportunities to split the data by grouping factors. See the comments:
# grouping by color:
biplot(pseq_clr, color = "nationality")
## [[1]]
##
## [[2]]
# grouping by color and facetting
biplot(pseq_clr, color = "nationality", facet = "timepoint.within.group")
## [[1]]
##
## [[2]]
# we might want to see bigger plots and avoid facetting. See the title of
# the plot to identify the group, here timepoint.within.group:
biplot(pseq_clr, color = "nationality", split_by = "timepoint.within.group")
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
# we can identify the samples using the text option:
biplot(pseq_clr, color = "nationality", split_by = "bmi_group", text = TRUE)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
Finally, we can predefine a subset of sample IDs. This can be helpful to inspect the development of few individuals over time. Inspecting all individuals over time in one plot can be impossible in large datasets due to overflow of information on the plot. If we want to inspect the development of two individuals over time we can do it as follows: We first define all sample IDs belonging to those subjects and then pass the sample IDs to the argument “filter_samples” in the biplot function. To inspect development over time we need to provide the column name of the timepoint variable to the argument “connect_series”. Note that to achieve reasonable output, it must be assured that also “subject_id” is defined correctly. The function assumes that the corresponding column name is “subject_id”. You might need to manually change it to the column name used in your phylosec sample data that identifies subjects (or samples sites in general).
# First I need to subset the data. Here I do it outside of the phyloseq object
# but it can also be done in different ways as long as you obtain the
# subject IDs.
meta <- sd_to_df(pseq_clr)
meta %>% head() %>% kable()
| sample_id | subject | sex | nationality | group | sample | timepoint | timepoint.within.group | bmi_group |
|---|---|---|---|---|---|---|---|---|
| Sample-1 | byn | Male | AAM | DI | Sample-1 | 4 | 1 | obese |
| Sample-2 | nms | Male | AFR | HE | Sample-2 | 2 | 1 | lean |
| Sample-3 | olt | Male | AFR | HE | Sample-3 | 2 | 1 | overweight |
| Sample-4 | pku | Female | AFR | HE | Sample-4 | 2 | 1 | obese |
| Sample-5 | qjy | Female | AFR | HE | Sample-5 | 2 | 1 | overweight |
| Sample-6 | riv | Female | AFR | HE | Sample-6 | 2 | 1 | obese |
# select any number of individuals that we want to look at:
two_individuals <- meta %>% filter(subject %in% c("byn", "nms"))
two_individuals %>% kable()
| sample_id | subject | sex | nationality | group | sample | timepoint | timepoint.within.group | bmi_group |
|---|---|---|---|---|---|---|---|---|
| Sample-1 | byn | Male | AAM | DI | Sample-1 | 4 | 1 | obese |
| Sample-2 | nms | Male | AFR | HE | Sample-2 | 2 | 1 | lean |
| Sample-10 | nms | Male | AFR | HE | Sample-10 | 3 | 2 | lean |
| Sample-18 | nms | Male | AFR | DI | Sample-18 | 4 | 1 | lean |
| Sample-26 | nms | Male | AFR | DI | Sample-26 | 5 | 2 | lean |
| Sample-40 | byn | Male | AAM | ED | Sample-40 | 1 | 1 | obese |
| Sample-49 | byn | Male | AAM | ED | Sample-49 | 6 | 2 | obese |
| Sample-127 | byn | Male | AAM | HE | Sample-127 | 3 | 2 | obese |
| Sample-148 | byn | Male | AAM | DI | Sample-148 | 5 | 2 | obese |
| Sample-154 | byn | Male | AAM | HE | Sample-154 | 2 | 1 | obese |
| Sample-207 | nms | Male | AFR | ED | Sample-207 | 1 | 1 | lean |
| Sample-215 | nms | Male | AFR | ED | Sample-215 | 6 | 2 | lean |
# now we can use biplot:
biplot(
pseq_clr,
connect_series = "timepoint",
filter_samples = two_individuals$sample_id,
color = "subject",
subject_id = "subject"
)
## [[1]]
##
## [[2]]
data(atlas1006)
pseq_clr <- transform(atlas1006, transform = "clr")
meta <- atlas1006 %>% sd_to_df()
head(meta) %>% kable()
| sample_id | age | gender | nationality | DNA_extraction_method | project | diversity | bmi_group | subject | time | sample |
|---|---|---|---|---|---|---|---|---|---|---|
| Sample-1 | 28 | male | US | NA | 1 | 5.76 | severeobese | 1 | 0 | Sample-1 |
| Sample-2 | 24 | female | US | NA | 1 | 6.06 | obese | 2 | 0 | Sample-2 |
| Sample-3 | 52 | male | US | NA | 1 | 5.50 | lean | 3 | 0 | Sample-3 |
| Sample-4 | 22 | female | US | NA | 1 | 5.87 | underweight | 4 | 0 | Sample-4 |
| Sample-5 | 25 | female | US | NA | 1 | 5.89 | lean | 5 | 0 | Sample-5 |
| Sample-6 | 42 | male | US | NA | 1 | 5.53 | lean | 6 | 0 | Sample-6 |
# since the output is a list of ggplot objects,
# we can easily add layers or change theme as usual
list_of_plots <- biplot(pseq_clr, color = "nationality")
list_of_plots[[1]] + theme_void()
# or to all plots at once:
map(list_of_plots, ~.x + theme_classic())
## [[1]]
##
## [[2]]
Gloor, G. B., Wu, J. R., Pawlowsky-Glahn, V., & Egozcue, J. J. (2016). It’s all relative: Analyzing microbiome data as compositions. Annals of Epidemiology, 26(5), 322–329. https://doi.org/10.1016/j.annepidem.2016.03.003